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Abstract. The Kimura 3-parameter model on a tree of n leaves is one of the most 
used in phylogenetics. The afhne algebraic variety W associated to it is a toric 
variety. We study its geometry and we prove that it is isomorphic to a geometric 
quotient of the affine space by a finite group acting on it. As a consequence, we 
are able to study the singularities of W and prove that the biologically meaningful 
points are smooth points. Then we give an algorithm for constructing a set of 
minimal generators of the localized ideal at these points, for an arbitrary number 
of leaves n. This leads to a major improvement of phylogenetic reconstruction 
methods based on algebraic geometry. 



I. Introduction 

The goal of phylogenetic algebraic geometry is to translate the knowledge of alge- 
braic geometry into new tools for phylogenetic inference problems. The dictionary 
used in this translation is based on algebraic statistics, which allows viewing statisti- 
cal evolutionary models as algebraic varieties. The first approaches in this direction 
are due to Allman and Rhodes |AR03j and Pachter and Sturmfels [PS04j . Since then, 
many other authors have contributed to the development of phylogenetic algebraic 
geometry, either from the more geometric point of view (see for instance [ERSS05J, 
[5505], [AR07j . |WB06j . [CS05] ) or from the applied standing point f [ES93] . [ErT05] . 
|CFS 07]). The base of algebraic statistics for computational biology were finally set 
up in the book [PS05]. 

The applications of algebraic geometry to phylogenetics rely on the computation 
of the generators of the ideal of the algebraic variety associated to a statistical 
evolutionary model on a phylogenetic tree T. In phylogenetics, these generators are 
called phylogenetic invariants as they are useful to infer the topology of the tree 
T (note that in phylogenetics, topology refers to the topology of the graph T with 
labels at the leaves). Phylogenetic invariants have been given for some algebraic 
evolutionary models, namely the general Markov model ( |AR07] ) and group-based 
models (Kimura models [Kim80j . [Kim81j . and Jukes-Cantor model (JC69J). 
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In this paper, we deal with the Kimura 3-parameter model. As it was shown in 
[SS05], the Kimura 3-parameter model on a tree of n species is a toric variety in a 
suitable coordinate system (Fourier coordinates). Sturmfels and Sullivant gave an 
algorithm to construct a set of generators of the ideal of this variety for any number 
of species n. For example, for four species a set of minimal generators contains 
8002 binomials of degrees 2, 3 and 4. In a previous paper, we proved that this 
set of binomials can be successfully used for phylogenetic inference (see |CFS07j ). 
However, this is a large number of generators if one considers that the codimension 
of the variety is 48. Moreover, as the number of species increases, the codimension 
increases exponentially but the number of generators given in [31505 ] increases more 
than exponentially. This makes phylogenetic reconstruction methods based on this 
set of generators unfeasible for larger trees. 

The main goal here is to prove that the points of biological interest are smooth 
points of the algebraic variety and to provide the generators of a local complete 
intersection at these points. To this end, we prove that this Kimura variety W is 
isomorphic to the quotient of a certain affine space under the action of a finite group 
(Corollary 13.71) . This result allows the study of the singular locus of W and shows 
that there are no singularities in the points with biological meaning (Corollary l3.13p . 
We use this result in section 4, where we provide a recursive procedure to give a 
minimal sequence of generators for the variety W near these points (Theorem 14. 5p . 
As an example, the whole list of these generators in the case of trees with 4 leaves 
is given (Example 14. 8p . 

In the paper |SSEW93"] the authors also provided a local complete intersection for 
the Kimura 3-parameter model (they called it a complete collection of invariants). 
In their case the degree of the generators increases exponentially on the number of 
leaves n and this makes it unfeasible to be used in a phylogenetic reconstruction 
method for large trees. Our set of generators for the local complete intersection 
consists of binomials of degree 2 and 4 for any number of leaves n and leads to some 
hope for the generalization of the method given in |CGS05j to arbitrary trees. It is 
worth mentioning that Hagedorn |Hag00| also realized that there exists an open set 
of the variety in which it is sufficient to consider a local complete intersection (this 
is clear if one knows that the set of singular points on a variety form a Zarisky closed 
subset). However he did not specify the open subset nor the set of generators. 

This paper is organized as follows. In section 2 we review the relation between al- 
gebraic geometry and statistical evolutionary models for phylogenetic inference. In 
this section as well, we recall the discrete Fourier transform (or Hadamard conjuga- 
tion) introduced by Evans and Speed (see [ES93j) as a linear change of coordinates 
which diagonalizes group-based models. Then we introduce the algebraic varieties 
we are interested in and we set up notation used in the sequel. Section 3 is devoted 
to the global study of the geometry of the Kimura variety and to determine its 
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singular points. In section 4 we perform a local study of the variety at the biologi- 
cal meaningful points and we give an algorithm to obtain the generators of a local 
complete intersection at these points. 

Acknowledgments: The first author would like to thank L. Pachter and B. 
Sturmfels for introducing her to this subject and encouraging her to work on it. 
The second author is deeply grateful to M. Casanellas for her warm wellcome on 
this topic and for giving him the oportunity of working together. 

2. Preliminaries 

Let T be a tree (i.e. a connected undirected acyclic graph) of n leaves labelled as 
1,2, ... ,n. The degree of a node in T is defined as the number of edges incident to 
it. Nodes of degree one are leaves L(T), while the others are internal nodes N(T). 
We assume that our trees are trivalent, i.e. internal nodes have degree 3, and we 
call E(T) the set of edges in T. An edge in T is said to be terminal if it contains 
one leaf. We write e\ for the terminal edge ending at leaf I, I G {1, . . . , n}. It is easy 
to see that the number of internal nodes is \N(T)\ = n — 2 and the number of edges 
is \E(T)\ = 2n-3. 

2.1. Algebraic evolutionary models. In phylogenetics, a tree T represents the 
ancestral relationships (edges) among a set of species (nodes). The leaves L(T) 
represent the current species whose phylogenetic history we wish to infer. The input 
data is an alignment of n sequences in the alphabet S := {A, C, G, T} (representing 
nucleotides) of length N, and one needs to infer the correct phylogenetic tree that 
produced the observed alignment. 

In order to explain the relationship between phylogenetic inference and algebraic 
geometry it is useful to assume for the moment that the tree is rooted. That is, the 
graph T is directed and it has a unique not trivalent node called the root of the tree 
with two edges emerging from it. This assumption will be removed in subsection 

From the biological standing point, Kimura 3-parameter model is a stationary 
Markov model of evolution. Kimura [ Kim81| proposed a statistical model of evo- 
lution under the following assumptions: all sites in the n sequences evolve equally 
and independently, the distribution of nucleotides at the root is uniform and the 
tree is stationary (and hence all nodes of the tree have uniform distribution of nu- 
cleotides), the evolution of a species depends only of the node immediately preceding 
it, mutations occur randomly and with strictly positive probabilities, and transitions 
(mutations between purines A,T or between pirimydines C,G) occur more often than 
transversions (mutations between purines and pirimydines). As all sites evolve in- 
dependently and in the same way, one restricts the model to one site. We describe 
here an algebraic version of this model (see the books |PS05j and [AR04J for an 
introduction to the algebraic versions of evolutionary models). 
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In statistical evolutionary models, to each node v of the tree T we associate a 
discrete random variable X v that takes values on £ = {A, C, G, T}. The parameters 
of algebraic evolutionary models are the substitution probabilities of nucleotides 
along each edge. These parameters are written in a matrix indexed by the alphabet 
elements E so that the matrix S e associated to the edge e is 

A C G T 

A f P(A\A,e) P(C\A,e) P{G\A,e) P(T\A,e)\ 

qe _ C P{A\C,e) P(C\C,e) P{G\C,e) P{T\C,e) 

C7 P(A\G,e) P{C\G,e) P{G\G,e) P{T\G,e) 

T \P(A\T,e) P(C\T,e) P(G\T,e) P(T\T,e)J 

where S* y = P(x \ y, e) is the probability that nucleotide y at the parent of edge e, 
s(e), mutates to nucleotide x at the descendant node t(e). Then the probability of 
observing nucleotides x\ . . . x n at the leaves of the tree is given by a Markov process: 



S (: 



Pxt-.-Xn ^ TT S x B{e) ,x t(e) 

{(xv)veN(T)\*v£E} eeE{T) 

where we assume that if e = e\ is a terminal edge, then x t ( e \ = x\. In the Kimura 
3-parameter model the substitution matrices have the following form 

/ a e b e c e d e \ 
b e a e d e c e 
c e d e a e h e 
\ d e c e b e a e J 

where a e + b e + c e + d e = 1. This model includes the more restrictive models of 
Jukes-Cantor (where b e = c e = d e , [JC69J) and Kimura 2-parameter model (b e = d e , 
|Kim80j ) 

In the algebraic geometry setting, the Kimura 3-parameter model is given by the 
polynomial map 

n a 3 — a 4 "- 1 

e€E{T) 

((a e ,6 e ,c e ,rf e )) ee£ ;(T) (Px 1 ...x n )x 1 ,...,x n e^ 

where p Xl ,..., Xn ^ s given by (12.11) and A d denotes the standard o?-dimensional simplex 
in IR d+1 . As we are interested in algebraic varieties, instead of restricting to the 
simplex, we also consider this polynomial map as 

n ees( T) c 4 — c 4 " 

((a e ,b e ,c e ,d e )) eeE{T) ^ {p Xl ...x n )x 1 ,...,x n e& ' 

One of the goals in phylogenetic algebraic geometry is determining the ideal of 
the closure of the image of this polynomial map. Knowing the generators of this 
ideal provides tools for phylogenetic inference. See for example [CFS07] and |Eri05] 
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where some of these methods for phylogenetic inference have been proposed. In 
order to find the generators of this ideal, it is extremely useful to perform change of 
coordinates as we explain below. 

2.2. Fourier transform. The models described above are known as group-based 
models because if the nucleotides are thought of as the elements of the group 

H = Z/(2) x Z/(2) 

(namely A = (0,0), C = (1,0), G = (0, 1), T = (1, 1)) then the entries {Sl h } gMH 
in the substitution matrices S e can be expressed as functions of the group f e {h — g) 
(see |SS05] for details). For the Kimura 3-parameter model, the function f e is 



r(h) = 



a e 


ifh = 


(0,0) 


If 


if h = 


(1,0) 


c e 


if h = 


(0,1) 


d e 


if h = 


(1.1) 



As a consequence, probabilities p Xl ...x n can also be thought of as functions on H x 
■ • • x H . In what follows, when we add nucleotides, we mean addition in the group 
H. 

One of the main properties of group-based models is that a discrete Fourier 
transform simplifies the expression in the probabilities (12. ip . We briefly recall 
how this Fourier transform works and we refer to [SS05J and [CGS05j for more 
details. Given a function / : H — ► C, its discrete Fourier transform is the function 
/ : H v = Hom(H, C) — ► C defined by 

fix) = £xto)/G/)- 

g<=H 

The Fourier transform turns convolution into multiplication and this allows to sim- 
plify the expression of joint probabilities: 

Theorem 2.1 (Evans-Speed |ES93] ). Let p(gi, . . . , g n ) be the joint distribution of a 
group-based model for a phylogenetic tree T , then its Fourier transform has the form 

(2.2) g(xi,...,xn) = n^( n *o 

e£E(T) l£{leaves below e} 

As H and its dual H v are isomorphic groups we can identify (xi, • • • ,Xn) with 
the corresponding tuple (gx, . . . , g n ). From now on, q(xi, ■ ■ ■ , Xn) wm De denoted as 
q 9l ...g n - In the additive notation of the group H, one can rewrite expression (I2.2p as 

Qgi,..,gn = II / e ( m ( e )) 

e€E{T) 
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where m(e) = Xwefieaves below e} 9i- The Fourier transform is a linear coordinate 
change given by 

(2-3) q gi ... gn = X 91 (Ji) " " ■ X 9n Un)p jv - jn 

where x % is the character of the group associated to the ith. group element: 





A 


C 


G 


T 


x A 


1 


1 


1 


1 


x c 


1 


-1 


1 


-1 


x° 


1 


1 


-1 


-1 


T 

X 


1 


-1 


-1 


1 



In this new coordinate system, the Fourier transforms f e of the substitution func- 
tions are the new parameters of the model. For the Kimura 3-parameter model, 
these Fourier transforms become 

a e + b e + c e + d e {{ h = ^ q) 

a e_ b e + c e_ d e [fh=(l,0) 
a e + b e _ c e _ d e j f = ^ ^ 
a e _ b e _ c e + d e {{ h = ^ ^ 

As before, we think of these substitution functions as matrices. Therefore, the 
parameters in Fourier coordinates are diagonal matrices 

/ Pa \ 



\ 









P% 



c 






pe 
G 








P4 



J 



where P\ 



a e + b e + c e + cf , P; 



a e _ b e _ c e + ^e _ 



C 



b e + c e - d e , P e G = a e + b e -c e - d e 



From now on, P e will indistinctly denote this diagonal matrix or the vector 

cult to 
There- 



(P|, P^,Pq, P?f) and we will restrict to Fourier coordinates. It is not difficult to 



(2.4) 



I pe pe pe pe\ 
\ r Ai r Ci r Gi r T)e 



see that if g± + ■ ■ ■ + g n ^ 0, then q gi ... gn = (cf. Proposition 29 of 
fore, the polynomial map we are interested in is 

Y[ C 4 — C 4 "" 

e€E{T) 

(q*i.., n U 

where X\ H h x n = 0, q Xl ... Xn = fL^m p L( p m and rn(ei) = x t if e x ends at leaf I. 



~ llee-B(T) ^m(e)' 

Notation 2.2. The image of the standard simplex A d under the Fourier change of 
coordinates will be denoted as A d . For a picture of A 3 , see figure [3j Notice that the 
Fourier change of parameters transforms the hyperplane a e +b e +c e +d e = 1 into P e A = 
1, for all e G E(T). As we will be interested in coordinates {q xl ,...,x n }x 1 +...+x n =o, we 
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Figure 1. (a) The three edges el, el, el coincident at an interior 
node v G N(T). (b) The two trees obtained from T in the proof of 
Lemma 12.41 

will focus on the simplex A 4 ™ -1 , which coincides with the projection of the Fourier 
transform of the simplex A 4 ™^ 1 onto this set of coordinates . As before, note that 
the Fourier change of coordinates transforms the hyperplane Yl p Pxi...x„ — 1 
into qA...A = l- 

2.3. Kimura variety. In subsections 12.11 and 12.21 we have assumed that the tree is 
rooted. However, in the Kimura 3-parameter model the matrices S e are symmetric 
and therefore parameterization (12. 4p does not depend on the orientation of the tree 
or the position of the root. One can even think of the root as being one of the leaves 
and in this case the root is one of the observed variables. In what follows we will 
consider unrooted trees. This is due to the issue of identifiability that induces the 
use of unrooted trees for phylogenetic inference. Roughly speaking, the question 
addressed by identifiability is whether observation data of character states at the 
leaves of the tree contain enough information in order to uncover the topology and 
the parameters of the model (see [Cha96j). This means that there precisely exists 
one topology and one set of parameters of the model that explain the data. The 
identifiability of the Kimura 3-parameter model has been established by Steel, Hendy 
and Penny in |SHP98j . 

The reformulation of the parameterization in Fourier coordinates for unrooted 
trees is given by the lemma I2T41 below. Before stating it, we introduce some notation: 

Notation 2.3. Given an interior node v G N(T), denote by el, el, el the three edges 
coincident at v (see (a) in figured]). Given three elements of the group x e i ,x e 2 ,x e z G 
H associated to these edges, we define 

x(v) := x e i + x e 2 + x e 3 

as a sum in H. 
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Lemma 2.4. Let T be an (unrooted) tree with n leaves. Then the parameterization 
of the Kimura 3-parameter model in Fourier coordinates is given by 

(2-5) q Xl ... Xn = J] P ^ 

eeE(T),x(v)=0 VueiV(T) 

where x\ + ■ ■ • + x n = and x ei = x\ if e\ is the terminal edge corresponding to the 
leaf I. 

Proof. We already know that the parameterization for rooted trees is independent 
of the root placement, so we root the tree T at leaf 1. Then we need to prove that 

n = n 

e<=E(T),x(v)=0\/v£N(T) e€E{T) 

In other words, we want to prove that the condition x(v) = for all interior nodes 
v G N(T) is equivalent to x e = m(e) for all edges e G E(T). 

We first assume that for all interior nodes v , x(v ) = 0. We proceed by induction 
on the number of leaves of the tree. Let w be the only node next to the leaf 1 and let 
e x w be the terminal edge connecting 1 and w. Write T 2 and T3 for the two connected 
components obtained when removing e x w from T and adding W as a root (see (b) of 
figured]). By induction hypotheses on T 2 and T 3 , we have x e 2 = m(e^), x e 3 = m(e^), 
and x e = m(e) for all edges but e\. It remains to check that x ei = m(ei). This 
follows using the hypothesis that x{w) = 0. Indeed, x{w) = implies Xi = x e 2 +x e 3 , 
which in turn is equal to m(e^) + m(e^) and hence equal to x 2 + ■ ■ ■ + x n . 

In order to prove the converse we assume that x e = m(e) for all edges e G E(T). 
By induction hypothesis on the trees T2 and T3, the condition x(v) = holds for 
all interior nodes in T but w. We just need to show that x ei + x^ + x e 3 = 0. Our 
hypothesis implies that x ei +x e 2 i +x e 3 = m(ei) +m(e^,) +m(e^), and this vanishes 
because m(ei) = m(e^) + m(e^) by definition of m. □ 

Remark 2.5. In expression ( 12. 5p . the indices x e associated to edge e are completely 
determined by condition x(v) = 0, Vt> G N(T). Indeed, as at the terminal edges e\ 
we have imposed x e = xi, condition x(v ) = for nodes that join a cherry to the tree 
determines the value x e at those edges that join a cherry to the tree. Performing 
the same process from the exterior of the tree to the interior, one assigns a unique 
value to every edge. Condition x\ + • — h x n = guarantees that this assignment is 
consistent at all interior nodes. 

Example 2.6. For the unrooted 3-leaf claw tree (see (a) of figure [2]), the parame- 
terization (f of (12.41) in Fourier coordinates is given by 

n - pei pe2 pe3 

HxiX2Xz 1 Xl 1 x 2 x 3 

if Xi + X 2 + X 3 = 0. 
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Figure 2. (a) The 3- leaf claw tree, (b) The 4- leaves trivalent un- 
rooted tree. 

Example 2.7. For the unrooted tree with 4 leaves (see (b) of figure W) the param- 
eterization in Fourier coordinates is given by 

„ pei pe2 pe pe 3 pe 4 

yxiX2X3a;4 - 1 X \ 1 x 2 1 x±+X2 x 3 1 x 4 

if X\ + X2 + X3 + £4 = 0. 

Notation 2.8. From now on, we write V for the closure of the image of 

<p: J] C 4 — > C 4 "" 1 

where 

Qx 1 ...X n — ^[ e ! 

e6S(r),x(i;)=0 Vi>G7V(T) 

and a; e; = xi if e/ is the terminal edge corresponding to leaf I. We will denote Q n 
the following set of indeterminates 

Qn = {q xi ...x n \xi H h x n = in H}. 

The affine coordinate ring A(V) of V is isomorphic to the C-algebra 
C[{ II P l I = ^ VZ G N(T)} xl ... Xn& ], 

e G E(T) 

= 0, v« e N(T) 

because A(V) is defined as the image of the morphism of C-algebras 
(2.6) #:C[Q n ] — > C[{P x e |eG£(T),xG£}] 

eGE(T),x(u)=0 VugTV(T) 
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where x ei = x\ for all I G N(T). The toric ideal defining V is the kernel of this 
morphism and we denote it as ly. Sturmfels and Sullivant gave an algorithm to 
construct a set of generators of ly in [SS05]. We note that as the map 9 is homoge- 
neous, V is actually a cone over a projective variety. 

The variety we are interested in is V C\ {qA...A — 1} because the simplex in Fourier 
coordinates is contained in the hyperplane qA...A — 1- 

Definition 2.9. The Kimura variety of the phylogenetic tree T is 

W :=Vn{q A ... A = l}. 

Lemma 2.10. The Kimura variety W is the closure of the parameterization <p 
restricted to Y[ e&E{T) {^ n i P % = !})■ 

Proof. Write <pi for this restriction, and let p be the morphism of C-algebras 
defined as: 

p:C[Q n ] — > C[Q n \ {q A ...A}} 

f(qA...A,qA...cc,---) i-> f(l,q A ...cc- ■ ■) 
Then the affine coordinate ring of W is 

A(W) = C[Q n ]/(l v + (q A ... A ~ 1)) - C[Q n \ {q A ... A }]/(p(I v ). 

On the other hand, note that (p\ induces a morphism of C-algebras 

9 1 : C[Q n \ {q A ... A }] — . £\{P e x | e G E(T) and x G S \ {A}}] 

so that the following diagram commutes 
(2.7) 

— > ly — > C[Q n ] C[{P x e I e G P(T) and x G E}] 

| p I p if/ 

— p(/ y ) — . C[Q n \ {q A ... A }] C[{P x e | e G £(T) and a; G E \ {A}}] 

Here p' sends P| to 1 for each e G -E(P) and is the identity on the other indetermi- 
nates. Write X C C 4 ™ for the closure of the image of <pi. Then, the affine coordinate 
ring of X is = C[Q„,]/-ft'er(p / o 0). Since the above diagram commutes, we 

have that 

p ' oe = e i op = p-\p{Iy)) =Iy + (q A ...A ~ 1) 

and X = W. □ 



2.4. Biologically meaningful points. Here, we introduce some notation and we 
give a biological interpretation on the points of some polytopes in the simplices 
defined in subsection 12 .21 As above, let T be a phylogenetic tree with n leaves and 
write E(T) for the set of its edges. For any e G E(T), let A 3 be the parameter 
simplex associated to it (see Notation (12. 2p ). Write 

A 3 = {P = (1, P c , P G , P T ) G A 3 | P c , P G , P T > 0} 
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(1,-1,1,-1) 



Figure 3. The polytope A 3 , in the simplex A 3 . It represents the 
points of A 3 having non-negative Fourier coordinates. 

for the set of non-negative points in A 3 i.e. the polytope delimited by the vertex 
(1, 1, 1, 1), the points (1, 1, 0, 0), (1, 0, 1, 0), (1, 0, 0, 1) and the centroid (1, 0, 0, 0) of 
A 3 (see figure EJ), all of them written in Fourier coordinates. We write also A 3 ^ = 
{P G A 3 | 

^Ci^gi^t > 0}- Any P om t of Aq represents a substitution matrix 
S e satisfying that the probability of no mutation (a e ) plus the probability of any 
mutation in the site (b e , c e or d e ) is bigger or equal than 1/2. In particular, the 
probability of no mutation is bigger or equal than the probability of any particular 
mutation. Since this is a reasonable hypothesis if we work with realistic data, we 
will call the points in n e e£:(T) ^+ ^ ne biological meaningful parameters of the model. 
We also write 

y?+: J] Al — W 

e£E(T) 

(Pa-iPciPgi -^r)e l— > (<lx 1 ...x„)x 1 ,...,x n 

for the restriction of ip in (12.41) to these parameters. Its image is contained in W + : = 
PFflA + , where A + = {q G A 4 " _1 | q Xl ... Xn > 0, J2i x i = 0} i s the set of points with 
positive coordinates of the polytope delimited by the point l n = (1, . . . , 1) (which is 
one of the vertices on the simplex A 4 ™ the points g« = (1, ei) i= i ) ... i4 n-i_i where 
d = (0, . . . , 1, . . . , 0) G C 4 " -1 - 1 and the centroid (1,0,..., 0) of A 4 ""' 1 " 1 . We call 
the points of W + the biologically meaningful points of the model. This name will be 
justified in forthcoming Remark 13.121 where we show that W + equals the image of 
(p+. 

3. The geometry of Kimura 3-parameter model 

Let V C C 4 ™ be the affine variety associated to a tree T of n leaves, as defined 
above. In this section we are going to determine the singular points of the Kimura 
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variety W = V H {qA...A — !}■ To this aim, we will first prove that V is isomorphic 
to the quotient of (C 4 ) 2n ~ 3 by the action of a certain abelian group. 

In order to simplify notation, we briefly recall the notion of multigrading and refer 
to Chapter 8 of |MS05] for a nice introduction to multigraded polynomial rings (we 
also refer to [MS05] for the correspondence between toric varieties and quotients, 
although we need little preliminary knowledge in this subject for the results of this 
section.) 

Notation 3.1. Let M be a monomial in S = C[{P x e | e G E(T) and x G £}]. Then, 
M has the form M = Y\e^E(T) ( K P e ) 1 where each i(e) = {i{e)A, i(e)c, *(c)gj K e )T) is 
composed of natural numbers. This notation means that 

M= | [ (P e x )^ e) *. 

eGE(T),a;GS 

We call deg(i(e)) = z(e)^ + i(e)c + «(e)c + z(e)y. Each indeterminate in S has a 
natural multidegree in Z^ 7 ^' defined as 

deg(P x e ) = (0, . . . , 0, 1, 0, . . . , 0) for any x G E. 

Given a monomial M G S as above, we call deg(M) := (deg(i(e))) ee E(T)- Note that 
the image of 6 1 of (I2.6P is generated by monomials of degree d ■ (1, . . . , 1), so that 
they are multi-homogeneous with respect to the given grading. 

Notation 3.2. From now on, Z/(2) means additive group whereas Z 2 means mul- 
tiplicative group. 

3.1. The 3-leaves case. We start by studying the case n = 3 (see example I2.6p . 
We call V3 the corresponding affine variety in C 16 . The parameterization ip in this 
case is: 

<p : C 12 — ► C 16 

{(p2 ,p e c ' ,p e G ' r T x )iP7 ^ >p c 3 ' p G 3 'K 3 )) ^ (p* 1 Py 2 p-J) {x+y+z ^ x , y ,^ } 

In the next result we prove that V3 is an affine GIT quotient |MS05l chapter 10]. 

Proposition 3.3. V3 is isomorphic to the affine GIT quotient C 12 //G where the 
group 

G = {(A x , A 2 , A 3 , e, 5) \ \i e C*, (e, 5) G Z 2 x Z 2 , AiA 2 A 3 = 1} ~ (C*) 2 x Z 2 x Z 2 

acts on C 12 sending (P ei , P e2 , P e3 ) to 

(X 1 (PX\eP^\5P^\e5P^),X 2 (P e A \eP^,SP^,e5P^),X 3 (P e A \eP^,SP^,eSP^)). 

In order to prove this proposition, we need a technical lemma that we state sep- 
arately for future reference. 
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Lemma 3.4. Let i = (i A , i c , ig, «t), j = (J a, jc, 3g, 3t) , k 
4-tuples in A/" 4 . Then the set of indices (i,j,k) that satisfy 



(k A ,k c ,k G ,k T ) be 



(3.1) 



is equal to 
{(iJ,k) | 5 



U + ic + ic + it = 1 
3 A + jc + 3g +3t = 1 
k A + k G + k G + k T = I 
ic + ir + jc + 3t + k c + k T = in Z/ (2) 
&g + «V + Jg + jV + &g + k T = m Z/(2) 



= = l,deg(i) = deg(j) = deg(£;) = l,ac + 2/ + z = in H}. 



Proof. Let (i,j,k) satisfy (13.11) . The first three equations imply that for each 
index, there is just one letter in £ such that the corresponding entry is non-zero, 
and in fact, equal to one. We write i x = 1, j y = 1, k z = 1. As in the first section 
we think of the letters in S as elements in Z/(2) x Z/(2): A = (0,0), C = (1,0), 
G = (0,1), T = (1,1). We call 



Iac = ia + ic, 

J AC = 3A+jc, 

K A c = k A + k c , 



Ic T ~ 
Jct - 
K C T 



ic + ir, 
3c+3t, 
= k c + k T , 



Igt - 
Jgt - 
K GT 



ic + ir, 
: 3 c + 3t, 
= k G + k T . 



In this setting, i x — 1 if and only if x = (I G t,Igt) i n Z/(2) x Z/(2). Similarly, 
j y = 1 (resp. k z = 1) if and only if y = (J gt ,Jgt) (resp. z = [K CT ,K GT ) ) in 
Z/(2) x Z/(2). The last two equations in (13 .ip can be written as 



Ict + Jct + Kct 
Igt + Jgt + -Kgt 



in Z/(2) 
in Z/(2) 



They imply that x + y + ^ = 0in Z/(2) x Z/(2). 

As for the other inclusion, if (i, j, k) are three 4-tuples in A^ 4 of degree 1 (deg(i) = 
deg(j) = deg(k) = 1) whose non-vanishing indices i x = l,j y = 1, j z = 1 satisfy 
x + y + z = 0. Then the first three equations in (13.11) are clearly satisfied. The 
last two equations also hold because {i G + ix, i G + ir) — x , (jc + jr, 3g + 3t) — V, 
(k c + k T , k G + k T ) = z and x + y + z = in Z/(2) x Z/(2). □ 

We prove the proposition above. 

Proof of Proposition |3Jil Recall that C 12 //G is defined as Spec(S°), the spec- 
trum of the ring of invariants S G where S = C [P% , . . . , , P e A 2 , . . . , , P% , . . . P^] . 
This ring is generated as a C-algebra by those monomials invariant by the action of 
G. 

Monomials in S are of the form (p ei ) 1 (p e2 )J(p e 3) k where i, j, k are sets of natural 
numbers i = (i A , ig, ic «t), j = (3a,3c,3g,3t), k = (k A , k c , k G , k T ) and (P ei f 
means (P^) lA (P G 1 ) lc (P G ) l °(P T 1 ) lT ■ A monomial is invariant under the action of G 
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if and only if for any (Ai, A 2 , A 3 , a = (e, 5)) in G we have 

y.A-\ Mr ^3A-\ h?T \ k A~) ^~ k T ^ic+ir+jc+jT+kc+kr ^iG+iT+jG+jT+ka+kx _ ■[ 

This happens if and only if 

U H h «V = 3 a H h Jt = k A H h &t 

«c + + jc + Jt + kc + kr — in Z 2 
«g + ir + jo +j T + k G + k T = in Z 2 

Therefore S" 3 is minimally generated as a C-algebra by those monomials (P ei ) 1 (P e2 )J(P e 3) k 
that satisfy 

ia + «c + «g + h = 1 

JA + jc + JG + JT = 1 
&4 + fee + &G + = 1 
«g + ir + 3 c + 3t + k c + k T = in Z 2 
«g + h + 3g +jT + k G + k T = in Z 2 

By the lemma following this proof, this set of monomials is precisely 

{pe ip e 2 pe 3 \ x + y + z = 0m Z/(2) X Z/(2)}. 

Therefore S G is the finitely generated C-algebra 

q^peipeapes | x + 2/ + 2 = in Z/(2) X Z/(2)}] 
which is isomorphic to the affine coordinate ring of V. □ 

3.2. The general case. Now we generalize Proposition 13.31 to trees with an arbi- 
trary number n of leaves. Recall that the number of edges in such a tree is 2n — 1. 
By a path a in T we mean a minimal subgraph of T connecting two nodes (interior 
nodes or leaves). We write o = {si, . . . , s r } for the sequence of edges in a. Let 

G C (C* x Z 2 x Z 2 ) 2 "- 3 

be defined as the subset composed of the elements (A e , e e , S e ) eE E(T) such that n e e£;(T) = 
1 and satisfying the following condition 

(*) for any path a = {si,...,s r } between two leaves in T, Yl i= i £ s- — 1 an d 

It is immediate to see that the natural product induces a group structure in G. 
Moreover, we have 

Lemma 3.5. The group G is isomorphic to (C*) 2n_4 x (Z 2 x Z 2 ) n ~ 2 . 

Proof. For any interior node v of T, let el, el, el be the edges incident at v, and 
write e(v) = {e e , <5 e } eei ?(T) by taking e e i = e e i = e e i = -1, e e = 1 for the remaining 
edges. Similarly, we define S(v). Let eo G E(T) and take the ring homomorphism 

$ : (C*)'^!- 1 x (Z 2 x Z 2 )™ — (C* x Z 2 x Z 2 ) 2 ™- 3 
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defined by mapping ((/i e ) e ^ eo , A)»eJV(T)) to (A e , s e , 5 e ) e€E{T ), where A e = fx e if 
e ^ e , A eo = (EL^eo -^e)" 1 and e e = ]\ v(Le e{v), 5 e = Y[ vee S(v). The ima S e of V> is 
G and it is easy to check that ip is a monomorphism. The claim follows. □ 

The main result of this section is the following theorem. 

Theorem 3.6. Let T be a tree with n leaves and let G be the group defined above. 
Let G act on Y[ e&E{T) C 4 by sending (Pf , P£, P G , Pf) eeE (T) to 

JTien V is isomorphic to (C 4 ) 2n_3 //G. 

Proof. We need to check that the affine coordinate rings of V and (C 4 ) 2n_4 //G 
are isomorphic. If S is the algebra S = C[{P^ | e G P(T) and x G £}], we need to 
check that the ring of invariants S G is isomorphic to 

C[{ J] PI | x i 6E,x ei =x i V(6{l,..,n}]. 

e£E(T),x(v)=0 Vv£N{T) 

Let M G S be a monomial. Then, M has the form 

m= n ( pe ) i(e) ' 

e£E(T) 

with the notation introduced in 13.11 This monomial is invariant by the action of G 
if and only if for any (A e , e e , <5 e )ee£(T) G G we have 

(3.2) 1 = Y[ \ deg( - i( - e ^ Y[ 5 i ( e )c+ i ( e )T J~J £i(e) G +i(e) T _ 

egS(T) ee£(T) eeS(T) 

A s rieg£;(T) = 1, equation (13.21) implies deg(i(e)) = deg(i(e')) for all e,e' G E(T) 
(in the language of the previous section, this means that M is multi-homogeneous). 
Therefore the algebra S G is generated by monomials that satisfy deg(i(e)) = 1 for 
all edges e G E(T). We assume from now on that M satisfies this condition. 

Let v be an interior node of T, and let el, e 2 , el be the edges incident at v. If we 
take e e i = e e i = £ e i = —1 (resp. 5 e i = 8 e i = 8 e i = — 1) and e e = 1 (resp. <5 e = 1) for 
the remaining edges, condition (*) is satisfied. For this particular choice, equation 
(13. 2p implies 

f i(ei) C + i(eJ) T + i(el) c + z(e 2 ) T + i(e») + i(e») T = in Z/(2) 
\ z(eJ) G + i(ej)r + *(e 2 ) G + i(e 2 v ) T + z(e*) G + i(ej|)r = in Z/(2) 

Lemma E31 tells us that M = U ee E(T),x(v)=o v*siV(T) P x e - 

We need to prove the converse: if M = rieeE(r) x(«)=o v«eJV(r) ^x a f° r some given 
Xi, . . . , x n , we shall check that it is invariant by the action of G. In other words, if 
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{ie}ee£(T) denotes the set of exponents in M, we are going to check that equation 
(13. 2p holds. By lemma 13.4} condition x e i + x e 2 + x e z = is equivalent to 



i(el) c + i(el) T + i{e 2 v )c + i{e 2 v ) T + i(e 3 v )c + i(e 3 v ) T = in Z/(2) 
i(el) G + i(el) T + i(e 2 v ) G + i(e 2 v ) T + i(e 3 v ) G + i(e 3 v ) T = in Z/(2) 



Claim: If condition (E3D holds for any v G N(T), the sets 7ct = {e G P(T) | 
i(e)c + i(e)r = 1} and 7gt = {e G E'(T) | i(e) G + i{e)T = 1} a re unions of disjoint 
paths between leaves of T. 

Proof: If Xi = A for all leaves, then the set of conditions x e i + x e 2 + x e 3 = lead to 
x e = A for all e G E(T), so there is nothing to prove in this case. We assume that 
there is a leaf I such that x\ ^ A and we assume that x\ G {C, T} (if x\ G {G, T} 
we proceed analogously). Then ei belongs to j G t- 

Let v be the interior node connecting the edge e; to the rest of the tree. As ei 
is one of the edges intersecting at v, condition i(e\) G + i(el)r + i(el)c + i( e l)r + 
i( e v)c + i( e V)T = in Z/(2) implies that one of the other two edges emerging from 
v also belongs to j G t- We call this edge e v w and w is the other extreme of the edge. 
Then for w condition i(el) G + i(el) T + i(el) G + i(el) T + i(el) G + i(el) T = in Z/(2) 
again implies that one of the other two edges emerging from w belongs to 7ct- We 
repeat this process until we end at another leaf of T. We note that any two paths 
obtained this way are disjoint because an interior node cannot have three edges in 
7ct- Therefore the claim is proved. 

The claim immediately implies that the monomial M satisfies equation (13. 2ft . since 
elements in the group G are defined by the condition (*). □ 



Corollary 3.7. The coordinate ring of the Kimura variety W — VD {qA...A = 1} is 
isomorphic to S' G where 

S' = C[{P x e | e G E(T) andxeZ\ {A}}] 

and G' = (Z 2 x Z 2 ) n ~ 2 acts as a subgroup of the group G defined in Theorem \ 3.6\ 
Equivalently, W is the affine GIT quotient of Ilee£(T) ^ i-^X = 1}) m °dulo G' , 

J] (<c 4 n{P! = i})//G'. 

egB(T) 

Proof. Using diagram (12.71) we see that v4(W) is isomorphic to p(A(V)). By 
Theorem I3.6[ we know that A(V) ~ S G and it is enough to prove p(S G ) = S' G . 

We first prove p(S G ) C S' G ' . Let M((P|, P c , P G , P|) e6 B(T)) be a monomial in S G . 
Then p(M) = M((l, P%, P G , P|) eeE(T )). Let ^ = (e e ,5 e ) e&E{T) be an element in G'. 
We have that the action of g in p(M) is p • p(M) = M((l, ^eP^, ^eP^, £ e 5 e P|) e g£;(T))- 
Take (7 = (1, e e , 5 e ) e eE(r), which is an element of G, and notice that 

g-M = g'-p(M). 
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As M is invariant by the action of G, it is also invariant by this element g' . Hence 
g-p(M) = p(M). 

In order to prove the other inclusion we will use the multigrading notation in- 
troduced in 13. 11 Let M((Pq, Pq, Pj)e&E{T)) = YleeEiT)^.^) 1 ^ De a monomial in 
S' G ' . As 5" is a subring of S, there is also a multidegree associated to M, namely 
deg(M) = (deg(i(e))) ee £(T)- Now we make M multi-homogeneous: let D be equal to 
max egS ( T ) deg(i(e)) and consider the monomial A := n e e£(T)(-^A) d_deg ^ e ^^- Then 
A is a monomial in S invariant by the action of G because M was invariant by G' 
and A is mult i- homogeneous (so that equation (13. 2p holds). Moreover p(A) = M 
and we are done. □ 

Remark 3.8. It is worth pointing out that the action of Z 2 x Z 2 on A 3 : 

is just the reflection relative to some of the axis going through the centroid of 
A 3 . Namely, the actions of g± = (—1,1), g 2 = (1,-1) and g 3 = (—1,-1) are the 
reflections relative to the P^-axis, the Pc-axis and the Pr-axis, respectively. Thus, 
if we write A Ii9 = {P 6 A 3 P^Py < °} for an J x,y e Y,, then g'i(Ag) = A c ,t, 
^ 2 (A 3 ) = A G)T and g 3 {Al) = A c ,g- 

Corollary 3.9. The Kimura variety W is the geometric quotient 

J] (C 4 n {p% = \})/G' 

e£E(T) 

and coincides with the image of tp% (cf. Lemma \2.10\) 

Proof. By Corollary 13.71 the variety W is the categorical quotient defined by 
Spec(S') G . As G' is a finite group, the orbits of G' are closed and this catego- 
rial quotient is precisely the geometric quotient n e e£(T) (^ 4 ^ {P% = 1})/^' an ^ 
therefore, it coincides with the image of ipi (see Example 6.1 of [Dol03j). □ 

From this, we deduce the ident inability of the model (see subsection 12.41 and 
|Cha96j ). In particular, we have: 

Corollary 3.10. The Kimura variety W C C 4 ™ 

has dimension 3(2n — 3) and codimension 4 n_1 — 6n + 9. 

Corollary 3.11. Let q be a point in the Kimura variety W . Then 

|^- 1 (g)|<4"~ 2 

and the equality holds for generic points. The same holds for q G W R = </>((IR 3 ) 2n ~ 3 ) 
orq e W A = <p((A 3 ) 2n - 3 ). 



18 



MARTA CASANELLAS AND JESUS FERNANDEZ-SANCHEZ 



Proof. By Corollary 13.91 W is the image of <p in the commutative diagram 

rW)(c 4 n {P% = i}) 1 * w 

^^^^^^^^ 1 ' 

n eeS( T)(C 4 n{P| = i})/C7' 

It follows that if q G W, <£ _1 (<7) consists of one point (1, Pq, Pq, Pj,) e eE(T) and its 
images under the action of G' . Since \G'\ — 4™~ 2 , the claim follows. Moreover, the 
image of a point in M. 3 (resp. A 3 ) under the action of G' stays in M 3 (resp. A 3 ). □ 

Remark 3.12. Notice that the pre-images by (pi of the point l n = (1, . . . , 1) are 

V -\p) = {(l,e e PZ,5 e P%,e e 8 e PZ) eeE{T) \ (e e ,5 e ) e e G'}. 

Among all of them, there is only one with biological interest: ((1, 1, 1, l) ee E(T)), 
which represents the situation where, in probability parameters, the transition ma- 
trices of all edges are equal to the identity (no mutation occurs) . In general, for any 
point q G W + with real coordinates, there is one just preimage of biological interest. 

Keeping the notation of subsection 12.41 it follows from Corollary 13.91 that <p + is 
injective and that it is actually a bijection onto W + . This fact justifies the name 
biologically meaningful points given to the points of W+. The following result tells us 
that if q G has only positive coordinates, then q is non-singular and |<y? _1 (g)| = 
4 n_2 . Among all these pre-images, just the single one in n e es(r) ^+ nas biological 
meaning. 

Corollary 3.13. A point q = ipo(p) G Wa is singular if and only if there is some 
e G E(T) such that P^ G A 3 fl {PqPqP^ = 0}. In particular, no point with 
biological meaning is singular. 

PROOF. It is well-known that the singular points on W are the points {q G W \ 
< 4 n ~ 2 }, i.e. those points for which at least one of their pre-images are 
invariant by the action of some g G G'. By Remark 13.81 we know that these are 
precisely the points lying on an axis P^ = for some e G E(T) and some x ^ A. 
As a consequence, the points in W+ = f(Yl e eE(T) ^+) are non-singular. □ 

4. Local Complete Intersection 

Given a tree with n leaves, the main purpose of this section is to describe a 
procedure to determine a local complete intersection equal to the variety W = W n 
in the open set A + . 

Notation 4.1. For every n G N, we will write c(n) = 4 n_1 — 6n + 9. Note that in 
virtue of Corollary 13.101 c(n) is the codimension of W n . 
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In Corollary 13.131 we have seen that the points of (W n ) + are non-singular. Since 
any regular local ring is a complete intersection, the variety W n is a local complete 
intersection at these points, i.e. the ideal Iw can be generated by c(n) polynomials 
in a neighborhood of these points or more precisely, a minimal system of generators 
for the localization of Iw in these points consists of c(n) = 4 n_1 — 6n + 9 elements. 

The following lemma provides a minimal system of generators for this ideal in 
case the tree T has n = 3 leaves. 

Proposition 4.2. Let T be a tree with 3 leaves and let W% C C 16 be the model 



associated to it. Then, the set of quartics 




h = 


(IaaaQattOtcgItgc — 


QaccQaggQtatQtta 


h 2 = 


QccaQctgQtatQtgc ~ 


QcacQcgtQtcgQtta 


h = 


QaggQattQcacQcca — 


QaaaQaccQcgtQctg 


= 


QaccQattQgagQgga — 


QaaaQaggQgctQgtc 


h = 


QcacQctgQgctQgga — 


Qcc aQcgtQgagQgtc 


h 6 = 


QggaQgtcQtatQtcg — 


QgagIgctQtgc Qtta 



together with the equation h = qAAA — 1 is a local minimal system of generators 
for the ideal of W$ at the points of (W$)+. Namely, {hi, h 2 , h 3 , /14, /15, h & , qAAA — 1} 
generate the ideal Iw 3 in the local ring Ow, q , for any q G (W 3 ) + . 

Remark 4.3. It is worth pointing out that the minimal system of generators given 
in Proposition 14.21 does not depend on the point q G (^3)+- In the same way, the 
local complete intersection we will construct for an arbitrary tree of n leaves will be 
the same for all points in (W n ) + . 



Proof of Proposition 14.21 Let W C C 16 be the variety defined by the zero set 
of the ideal (hi, h 2 , h 3 , /i 4 , /i 5 , h&, h = qAAA — 1) and let q G (W 3 ) + . Let Jacw>(q) be 
the jacobian matrix of W at q: 





QAAA 


1ACC 


QAGG 


qATT 


qcAc 


qccA 


qCGT 


qCTG 


qGAG 


qacT 


qGGA 


qorc 


qTAT 


qrcG 


qTGC 


qTTA 


hi 


( * 


* 


* 


* 


























* 


* 


* 


* \ 


h 2 














* 


* 


* 


* 














* 


* 


* 


* 


h 3 


* 


* 


* 


* 


* 


* 


* 


* 


























/14 


* 


* 


* 


* 














* 


* 


* 


* 














h 5 














* 


* 


* 


* 


* 


* 


* 


* 














h G 


























* 


* 


* 


* 


* 


* 


* 


* 


h 


V 1 












































/ 



with entries 

(hi, qAAA) = QattQtcgQtgc (h\,qAcc) = —QaggQtatQtta 

(hi, qAGGj = —QaccQtatQtta (hi, qATT) = QaaaQtcgQtgc 

(hi,qTAT) = —QaccQaggQtta (hi,qTcc) — QaaaQattQtgc 

(hi,qTcc) — QaaaIattQtcg (hi, qTTA) = —QaccQaggQtat 
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(h 2 


qcAc) 


— 


—QcgtQtcgQtta 


(h 2 , 


Qcca) 




QctgQtatQtgc 


(h 2 


Qcgt) 


— 


—QcacQtcgQtta 


(h 2 , 


Qctg) 


— 


QccaQtatQtgc 


(ho 


Ht AT 1 ) 

HI Ai ) 






[ho 






yuAuyuCjj hi i a. 


(ho 


H± LrO f 






(ho 


Qrprp A J 

HI 1 A) 




— Or 1 Ar i Cirif~ ,r vCl r vr<r i 


(h. 


Qaaa) 


— 


—QaccQcgtQctg 


(h, 


Qacc) 




— QaaaQcgtQctg 


(h. 


Qagg) 


— 


QattQcacQcca 


(h 3 , 


Qatt) 


— 


QaggQcacQcca 


(h* 








(hi 


Cirr^ a ) 




(1 a r^^Cl a r r r rCl/ r ~ 1 /^ 1 a 


(hi 






— CI A A A CI A r"T^ (1 r^TT 1 


(hi 






— (7 a a a CI A r ,r v 




Qaaa) 


— 


— Qagg Qgct Qgtc 


'hi, 


Qacc) 




QattQgagQgga 


(h 4 . 


Qagg) 


— 


—QaaaQgctQgtc 


[hi, 


Qatt) 


— 


QaccQgagQgga 


(hi 


Or 1 A 1 






hi, 


"iCrO 1 ) 




H AAA y ACz CjHCj 1 Cj 


(hi 

V b 4 , 


Ctr^r^ a 1 
yOCrA y 






hi 


( t / trrts t | 






(h 5 , 


<?CL4Cy 


— 


QctgQgctQgga 


{h$, 


<?cca) 


— 


—QcgtQgagQgtc 


(h 5 , 


Qcgt) 




—QccaQgagQgtc 


(h5, 


Qctg) 




QcacQgctQgga 


(h, 


Qgag) 




—QccaQcgtQgtc 


(h5, 


Qgct) 




QcacQctgQgga 


(h 5 , 


Qgga) 




QcacQctgQgct 


(h$, 


Qgtc) 




—Qcc aQcgtQgag 


(h 


Qgag) 




— QgctQtgcQtta 


(he, 


Qgct) 




—QgagQtgcQtta 


(h 


Qgga) 




QgtcQtatQtcg 


(he, 


Qgtc) 




QggaQtatQtcg 


(he 


Qtat) 




QggaQgtcQtcg 


{he, 


Qtcg) 




QggaQgtcQtat 


(he 


Qtgc) 




—QgagQgctQtta 


(he, 


Qtta) 




—QgagQgctQtgc 


In general 


, one has that rk(Jac q (W')) < 


codim q (W) < 7. It can 



direct computation that the 6 x 6-matrix obtained from Jacw'(q) by removing the 
last row and keeping the columns indexed by qacc, Qatt, Qcac, Qctg, Qtat, Qtcg 
equals 



^iGAGUGCTlGGAlGTC 



QaaaQagcqattqcgaqggtQctgQtatqtggQtgc 

qAAAQACClAGGgCCAqCGTgcTG'lTATlTGCQTTA 
qAAAlACClATTqCACqcGTlCTGlTCGlTGClTTA 
qAAAq 2 ACClAGGqCACqcGT qCTG qT AT qTCGq%TA 



( 

+ 



) 



qAAAqAGGqATTqCACqcCAqCTGqTATqTCGqTGC + 

qAccq 2 AGGiATTqc Acqhc AqCTGqT ATqTGCqTT A + 

qAAAqAGGqATTqCAClCCAqCGTqTCGlTGCqTTA + 
qACCq 2 AGGlATTq%AClCCAqCGTqTATqTCGqTTA + 



which is clearly positive in A + . Therefore, rk(Jac q (W')) = 7 and so, q is a non- 
singular point of W and W is a local complete intersection at q. Therefore, W C 
C 16 is a subvariety of dimension 9 containing W3, which has also dimension 9 and 
is non-singular at q. It follows that W and W3 coincide in a neighborhood of q and 
we are done. □ 

Remark 4.4. For future reference, it is worth noting that the matrix J' obtained 
from Jac q (W) by removing the columns Qaaa, Qcc a, Qgg a, Qtta and the last row 
has maximal rank equal to 6. 
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Next, we want to describe a procedure to give a minimal system of generators 
for the ideal of W n around any point q G (W n ) + . Some of these generators are 
determined recursively from subtrees of T, while the remaining are easily inferred 
from some matrices to be defined later. 

First we describe how these generators are to be constructed by induction on 
the number of leaves. Then, we will prove that the whole set of these polynomials 
generate a complete intersection which equals the variety W n in a neighborhood of 
any q e (W n ) + . The generators of this local complete intersection ideal will not 
depend on the point q, as we pointed out in Remark [4.31 

Generators of degree 4. As above, write R = C[Q n ] for the ring of polynomials 
in the unknowns Q n . Following the idea of Chang |Cha96] . write v\, . . . , v n for the 
leaves of T. By reordering the leaves, we may assume that t> n _i and v n form a 
cherry, i.e. are joined to a node m. Take the tree T' with leaves L(T') = L(T) U 
{m} — {v n -%,v n }, interior nodes N(T') = N(T) — {m} and edges E(T') = E{T) — 
{[m, v n -i], [m, v n ]}, where [m, n] is the edge containing the nodes m and n (see figure 
HJ). In virtue of Corollaries 13.91 and 13.101 the variety W n -\ associated to T' is the 
image of the polynomial map in (j2.4p 

<p n -i : [<L ) > <L 

and has dimension 3(2n — 5). 

Assume that we have constructed a local complete intersection {gi, . . . , g c ( n -i)} 
at the points of (W n _i)+ (equivalently, {g±, . . . , g c (n-i)} generate the localization of 
the ideal It> at the points of (W / n _i) + ). The map j n -\ '■ Qn-i — > Qn defined by 
q X i...x n -i ^ Qxi...x n -iA induces a ring homomorphism 

^„_i : C[Q n _!] -> R. 

Write 

J(n-l) = {ft 1 \...,f^} ) }GR 

for the set of polynomials being the image by ip n -\ of the generators {gt}- 

Analogously, let T" be the tree with 3 leaves determined by the vertices t>i,t> n _i 
and v n . The variety W3 C C 16 is the image of 

<p 3 : (C 4 ) 3 — > C 16 

and has dimension 9. A complete system of generators {hi, . . . ,h®} of the ideal 
It" C CIQ3] is given by Lemma l4~2l As above, the map j 3 : Q 3 — >■ Q n defined by 
Qxyz l— QxA...Ayz induces a ring homomorphism 

^ 3 :C[Q 3 ] ^ R. 

Write 

J(3) = {/ 1 (3) ,...,/f}ci? 
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for the set of polynomials being the image by VVi-i of {hi, . . . , Ag}. The polynomials 
in J(3) and J(n — 1) are quartics, but we still need to construct an extra set of 
polynomials of degree 2. 

Generators of degree 2. Now, for each letter z£E, write M(z) for the 4 x 4™~ 3 - 
matrix with rows indexed by the couples {xy \ x + y = z}, columns indexed by 



{xi . . . x n _ 2 | Y^i=i x i = z } an d whose (xy, xi . . . x n _2)-entry is precisely q Xl , 



.x n -2Xy 



M(z) 



xy 



■x„- 2 xy 



\ 



For each of these matrices, take the set of the 3(4™ — 1) 2 x 2- minors containing 

(z) 

QzA...AzA'- we obtain polynomials 6 R of the form 

(4.1) Qx 1 ...x n - 2 xy ( lzA...AzA ~~ Qx 1 ...x n ^ 2 zA ( izA...Axy = 

for % — 1, . . . , 3(4™" 3 — 1). We get a total of 12 (4 n ~ 3 — 1) polynomials. For each 
letter zeE, write K(z) = {F^} for this set of polynomials and 

K=\jK(z). 

Theorem 4.5. At each point q G (W n ) + , the ideal generated by the set 

J{3)UJ{n-l)UK 

together with the equation qA...A = I is a local complete intersection that defines the 
Kimura variety W n in a neighborhood of q. 

Proof. First of all, direct computation shows that the number of polynomials 
being considered equals the codimension of the variety W, i.e. 

\J(3)\ + \J(n-l)\ + \K\ + l = 
6 + (4™- 2 - 6(n - 1) + 8) + 12(4"- 3 - 1) + 1 = 4™" 1 - Qn + 9. 

By [SS05] we know that the ideal p = (qA...A — 1 ? ^(3), J(n — ^),K) is contained in 
I Wn and so, W n is contained in the variety W defined by p. We claim that q is 
non-singular in W. From this, we deduce that W is a local complete intersection 
at q, and as we did in the proof of Lemma 14.21 we conclude that it is equal to W n 
in a neighborhood of q. 

Now we prove that q is a smooth point of W . Write Qo = {qAA...AAA, qcA...ACA, 
qcA...AGA,qTA...ATA], and notice that 



Qo = h{Q,z) n jn-i(Q 



n-li 
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5 



T 

Figure 4. 

By reordering the rows and columns if necessary, we may assume that the jacobian 
matrix of W at q has the form 

/BO \ 

Jac q {W) = J' 

y * * d j 

where the c(n — 1) x 4 n_2 -matrix B equals the jacobian matrix Jac qn _ 1 (W n ^i), 
J' is the 6 x 12-matrix of Lemma 14.41 In this way, the columns of the subma- 
trix B are indexed by the unknowns in Q n -i while the rows are indexed by the 
equations {qA...A — 1} U J{n — 1). Similarly, the columns of J' are indexed by 
Q3 \ {qaaa, qccA, Qgga, Qtta} while the rows are indexed by J(3). The columns of 
the matrix D are indexed by the remaining unknowns while its rows are indexed 
by the equations {F^} ze xi- Each of these equations has the form (14.11) and so, its 
partial derivative relative to the unknown q xl ... Xn _ 2Xy is equal to q z A...zA- Therefore, 
by reordering rows and columns if necessary we may assume that the matrix D is a 
diagonal matrix (and all its entries are strictly positive because q G (W n ) + ). 

In virtue of 14.41 we know that rank(J') = 6 and, by induction hypothesis, B 
has maximal rank equal to c(n — 1) = 4 n ~ 2 — Qn + 15. It follows that the matrix 
Jac q {W') has maximal rank equal to 

rank(Jac q {W')) = 4 n " 2 - Qn + 15 + 6 + 12(4 n ~ 3 - 1) = 4™" 1 - 6n + 9, 

and we are done. □ 

Remark 4.6. The set of quadrics K contains the information of invariants coming 
from the splits of the tree (see [SS Q5] and |Eri05] ) . Although in theory a tree can 
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be reconstructed from its splits (see |Eri05j Theorem 19.14), the variety defined by 
K is much bigger than W because it has codimension 12 (4™ -3 — 1). 



Remark 4.7. In |CFS07j . we studied a phylogenetic reconstruction method (already 
introduced in |CGS05j ) which was based on a set of generators of the ideal associated 
to the Kimura model. The simulation studies performed there showed that it is 
actually a very competitive and highly efficient method. In the case of 4-leaved 
trees and for the Kimura 3-parameter model, a minimal system of generators for 
the corresponding ideal consists of 8002 polynomials of degrees 2, 3 and 4. Because 
of the results of this paper, it is enough to deal with the 48 invariants listed in 
the following example (or in general, the codimension of the variety W n ). This 
leads to a substantial improvement in the efficiency and effectiveness of the method. 
Simulations studies on this variant of the method can be seen on the webpage 



http://wwww.mal.upc.edu/ -jfernandez/ci.html 



and the reader should contrast them to |CFS0 7|. Moreover, the fact that we pro- 
vide the smallest set of local generators in Theorem 14.51 gives some hope for the 
generalization of phylogenetic reconstruction methods based on algebraic geometry 
to trees with a large number of leaves. 



Example 4.8. Let T be the unrooted 4-leaved tree of figure [2](b). 
procedure gives rise to the following 48 invariants: 36 quadrics 



qcCCCQAAAA ~ QCCAAQAACC > QGGCCQAAAA 

QCCGGQAAAA - QCCAA QAAGG > qCGGGQAAAA 

QCCTTQAAAA - QCCAAQAATT j QGCTTQAAAA 

QACACQCACA ~ QACCAQCAACj QGTACQCACA 
QACGTlC AC A 
QACTGlCACA 

qagagqgaga 
qagctqgaga 
qagtcqgaga 
qatatQtata 

QATCGlTATA 

qatgcQtata 
and 12 quartics 

1aaaa1att aotcgaqtgc a 
qccaaqctgaqtataqtgca 
1agga1atta qcaca qccaa 
QACCAQATTA qgaga QGGAA 



QACCAQCAGTi qgtgtQcaca 

QACCAQCATGi qgttgqcaca 

qaggaqgaag, qctagqgaga 

qaggaqgacti qctctqgaga 

QAGGAlGATC i QCTTCQGAGA 

qattaqtaat j qcgatQtata 

qattaotacg, qcgcgqtata 

QATTAQTAGC j QCGGCQTATA 



qggaaqaacc 
qggaaqaagg 
qggaaqaatt 
qgtcaqcaac 
qgtcaQcagt 
qgtcaQcatg 
qctgaqgaag 
qctgaqgact 
qctgaqgatc 
qcgtaqtaat 
qcgtaqtacg 
qcgtaqtagc 



qttccQaaaa 
qttggqaaaa 
qttttqaaaa - 
qtgacqcaca 
qtggtQcaca 
qtgtgqcaca 
qtcagQgaga 
qtcctQgaga 
qtctcQgaga 
1gcat1tata ' 
igccgitata 
qgcgcqtata 



The above 

qTTAAlAACC, 
QTTAAQAAGG i 

qttaaqaatt, 

■ QTGCAQCAAC i 

- otgcaqcagt, 
■ qtgcaqcatg j 

■ qrCGAQGAAG, 

■ qtcgaqgact > 
■ qtcgaqgatc j 
qgctaqtaat j 
• qgctaqtacg j 
■ qgctaqtagc, 



qaccaqaggaqtataqttaai 

QCACA QG'GTAQTCGA qttaa , 
QAAAA QACCAQCGTA QCTGA , 
qAAAAqAGGAQGCTA 1GTCA , 
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QCACAICTGA QGCTA qggaa 
QGGAAQGTCA qtata qtcga 
Iaaaaqaatt qtac g qtagc 
qcacaqcatgQtaatQtagc 
qAAGGQAATT qcaac qcaca 
qaaccQaatt qgaag qgaga 
qcaacQcatgQgactQgaga 
qgaga qgatcqtaat qtacg 



qccaa qcgta qgaga qgtca , 
qgaga qgcta qtgca qttaa , 

QAACCQAAGGQTAATQTATAj 

qcaacQcagtQtacgQtata, 
qaaaa qaacc qcagtQcatg , 

QAAAAQAAGGQGACTQGATC-, 
QCACAQCAGTQGAAGQGATC} 
QGAAGQGACTQTAGCQTATA- 
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